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CORRECTION AND SCALING VECTORS WITH 
PARTITIONING OF THE ACOUSTIC SPACE IN 

THE DOMAIN OF NOISY SPEECH 

5 BACKGROUND OF THE INVENTION 

The present invention relates to noise 
reduction. In particular, the present invention 
relates to removing noise from signals used in 
pattern recognition. 

10 A pattern recognition system, such as a 

speech recognition system, takes an input signal and 
attempts to decode the signal to find a pattern 
represented by the signal. For example, in a speech 
recognition system, a speech signal (often referred 

15 to as a test signal) is received by the recognition 
system and is decoded to identify a string of words 
represented by the speech signal. 

To decode the incoming test signal, most 
recognition systems utilize one or more models that 

20 describe the likelihood that a portion of the test 
signal represents a particular pattern. Examples of 
such models include Neural Nets, Dynamic Time 
Warping, segment models, and Hidden Markov Models. 

Before a model can be used to decode an 

25 incoming signal, it must be trained. This is 
typically done by measuring input training signals 
generated from a known training pattern . For 
example, in speech recognition, a collection of 
speech signals is generated by speakers reading from 
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a known text. These speech signals are then used to 
train the models. 

In order for the models to work optimally, 
the signals used to train the model should be similar 
5 to the eventual test signals that are decoded. In 
particular, the training signals should have the same 
amount and type of noise as the test signals that are 
decoded. 

Typically, the training signal is collected 

10 under "clean" conditions and is considered to be 
relatively noise free. To achieve this same low 
level of noise in the test signal, many prior art 
systems apply noise reduction techniques to the 
testing data. In particular, many prior art speech 

15 recognition systems use a noise reduction technique 
known as spectral subtraction. 

In spectral subtraction, noise samples are 
collected from the speech signal during pauses in the 
speech. The spectral content of these samples is 

20 then subtracted from the spectral representation of 
the speech signal. The difference in the spectral 
values represents the noise-reduced speech signal. 

Because spectral subtraction estimates the 
noise from samples taken during a limited part of the 

25 speech signal, it does not completely remove the 
noise if the noise is changing over time. For 
example, spectral subtraction is unable to remove 
sudden bursts of noise such as a door shutting or a 
car driving past the speaker. 
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In another technique for removing noise, 
the prior art identifies a set of correction vectors 
from a stereo signal formed of two channel signals, 
each channel containing the same pattern signal. One 
5 of the channel signals is "clean" and the other 
includes additive noise. Using feature vectors that 
represent frames of these channel signals, a 
collection of noise correction vectors are determined 
by subtracting feature vectors of the noisy channel 

10 signal from feature vectors of the clean channel 
signal. When a feature vector of a noisy pattern 
signal, either a training signal or a test signal, is 
later received, a suitable correction vector is added 
to the feature vector to produce a noise reduced 

15 feature vector. 

Under the prior art, each correction 
vector is associated with a mixture component. To 
form the mixture component, the prior art divides the 
feature vector space defined by the clean channel's 

2 0 feature vectors into a number of different mixture 
components. When a feature vector for a noisy pattern 
signal is later received, it is compared to the 
distribution of clean channel feature vectors in each 
mixture component to identify a mixture component 

25 that best suits the feature vector. However, because 
the clean channel feature vectors do not include 
noise, the shapes of the distributions generated 
under the prior art are not ideal for finding a 
mixture component that best suits a feature vector 

30 from a noisy pattern signal. 
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In addition, the correction vectors of the 
prior art only provided an additive element for 
removing noise from a pattern signal. As such, these 
prior art systems are less than ideal at removing 
5 noise that is scaled to the noisy pattern signal 
itself . 

In light of this, a noise reduction 
technique is needed that is more effective at 
removing noise from pattern signals. 

10 SUMMARY OF THE INVENTION 

A method and apparatus are provided for 
reducing noise in a training signal and/or test 
signal used in a pattern recognition system. The 
noise reduction technique uses a stereo signal formed 

15 of two channel signals, each channel containing the 
same pattern signal. One of the channel signals is 
"clean" and the other includes additive noise. Using 
feature vectors from these channel signals, a 
collection of noise correction and scaling vectors is 

2 0 determined. When a feature vector of a noisy pattern 

signal is later received, it is multiplied by the 
best scaling vector for that feature vector and the 
product is added to the best correction vector to 
produce a noise reduced feature vector. Under one 
25 embodiment, the best scaling and correction vectors 
are identified by choosing an optimal mixture 
component for the noisy feature vector. The optimal 
mixture component being selected based on a 
distribution of noisy channel feature vectors 

3 0 associated with each mixture component. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
FIG. 1 is a block diagram of one computing 

environment in which the present invention may be 

practiced. 

5 FIG. 2 is a block diagram of an alternative 

computing environment in which the present invention 
may be practiced. 

FIG. 3 is a flow diagram of a method of 
training a noise reduction system of the present 
10 invention. 

FIG. 4 is a block diagram of components 
used in one embodiment of the present invention to 
train a noise reduction system. 

FIG. 5 is a flow diagram of one embodiment 
15 of a method of using a noise reduction system of the 
present invention. 

FIG. G is a block diagram of a pattern 
recognition system in which the present invention may 
be used. 

20 DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS 

FIG. 1 illustrates an example of a suitable 
computing system environment 100 on which the 
invention may be implemented. The computing system 
environment 100 is only one example of a suitable 

25 computing environment and is not intended to suggest 
any limitation as to the scope of use or 
functionality of the invention. Neither should the 
computing environment 10 0 be interpreted as having 
any dependency or requirement relating to any one or 
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combinatiori of components illustrated in the 
exemplary operating environment 100. 

The invention is operational with numerous 
other general purpose or special purpose computing 
5 system environments or configurations. Examples of 
well known computing systems, environments, and/or 
configurations that may be suitable for use with the 
invention include, but are not limited to, personal 
computers, server computers, hand-held or laptop 

10 devices, multiprocessor systems, microprocessor-based 
systems, set top boxes, programmable consumer 
electronics, network PCs, minicomputers, mainframe 
computers, distributed computing environments that 
include any of the above systems or devices, and the 

15 like. 

The invention may be described in the 
general context of computer-executable instructions, 
such as program modules, being executed by a 
computer. Generally, program modules include 

2 0 routines, programs, objects, components, data 
structures, etc. that perform particular tasks or 
implement particular abstract data types . The 
invention may also be practiced in distributed 
computing environments where tasks are performed by 

25 remote processing devices that are linked through a 
communications network. In a distributed computing 
environment , program modules may be located in both 
local and remote computer storage media including 
memory storage devices. 
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With reference to FIG. 1, an exemplary 
system for implementing the invention includes a 
general purpose computing device in the form of a 
computer 110 . Components of computer 110 may 

5 include, but are not limited to, a processing unit 
120, a system memory 13 0, and a system bus 121 that 
couples various system components including the 
system memory to the processing unit 12 0. The system 
bus 121 may be any of several types of bus structures 

10 including a memory bus or memory controller, a 
peripheral bus, and a local bus using any of a 
variety of bus architectures. By way of example, and 
not limitation, such architectures include Industry 
Standard Architecture (ISA) bus, Micro Channel 

15 Architecture (MCA) bus, Enhanced ISA (EISA) bus, 
Video Electronics Standards Association (VESA) local 
bus, and Peripheral Component Interconnect (PCI) bus 
also known as Mezzanine bus. 

Computer 110 typically includes a variety 

20 of computer readable media. Computer readable media 
can be any available media that can be accessed by 
computer 110 and includes both volatile and 
nonvolatile media, removable and non-removable media. 
By way of example, and not limitation, computer 

25 readable media may comprise computer storage media 
and communication media. Computer storage media 
includes both volatile and nonvolatile, removable and 
non-removable media implemented in any method or 
technology for storage of information such as 

3 0 computer readable instructions, data structures, 



-8- 

program modules or other data. Computer storage 
media includes, but is not limited to, RAM, ROM, 
EEPROM, flash memory or other memory technology, CD- 
ROM, digital versatile disks (DVD) or other optical 
5 disk storage, magnetic cassettes, magnetic tape, 
magnetic disk storage or other magnetic storage 
devices, or any other medium which can be used to 
store the desired information and which can be 
accessed by computer 100. Communication media 

10 typically embodies computer readable instructions, 
data structures, program modules or other data in a 
modulated data signal such as a carrier wave or other 
transport mechanism and includes any information 
delivery media. The term "modulated data signal" 

15 means a signal that has one or more of its 
characteristics set or changed in such a manner as to 
encode information in the signal. By way of example, 
and not limitation, communication media includes 
wired media such as a wired network or direct -wired 

20 connection, and wireless media such as acoustic, FR, 
infrared and other wireless media. Combinations of 
any of the above should also be included within the 
scope of computer readable media. 

The system memory 13 0 includes computer 

25 storage media in the form of volatile and/or 
nonvolatile memory such as read only memory (ROM) 131 
and random access memory (RAM) 132. A basic 

input /output system 13 3 (BIOS) , containing the basic 
routines that help to transfer information between 

30 elements within computer 110, such as during start- 
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up, is typically stored in ROM 131. RAM 132 
typically contains data and/or program modules that 
are immediately accessible to and/or presently being 
operated on by processing unit 120. By way o 
5 example, and not limitation, FIG. 1 illustrates 
operating system 134, application programs 135, other 
program modules 136, and program data 137. 

The computer 110 may also include other 
removable/non- removable volatile/nonvolatile computer 

10 storage media. By way of example only, FIG. 1 
illustrates a hard disk drive 141 that reads from or 
writes to non- removable , nonvolatile magnetic media, 
a magnetic disk drive 151 that reads from or writes 
to a removable, nonvolatile magnetic disk 152, and an 

15 optical disk drive 155 that reads from or writes to a 
removable, nonvolatile optical disk 156 such as a CD 
ROM or other optical media. Other removable/non- 
removable, volatile/nonvolatile computer storage 
media that can be used in the exemplary operating 

20 environment include, but are not limited to, magnetic 
tape cassettes, flash memory cards, digital versatile 
disks, digital video tape, solid state RAM, solid 
state ROM, and the like. The hard disk drive 141 is 
typically connected to the system bus 121 through a 

25 non-removable memory interface such as interface 140, 
and magnetic disk drive 151 and optical disk drive 
155 are typically connected to the system bus 121 by 
a removable memory interface, such as interface 150. 

The drives and their associated computer 

3 0 storage media discussed above and illustrated in FIG. 
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1, provide storage of computer readable instructions, 
data structures, program modules and other data for 
the computer 110. In FIG. 1, for example, hard disk 
drive 141 is illustrated as storing operating system 
5 144 , application programs 145 , other program modules 
14 6, and program data 14 7. Note that these 

components can either be the same as or different 
from operating system 134 , application programs 135 , 
other program modules 136, and program data 137. 

10 Operating system 144, application programs 145, other 
program modules 14 6, and program data 147 are given 
different numbers here to illustrate that, at a 
minimum, they are different copies. 

A user may enter commands and information 

15 into the computer 110 through input devices such as a 
keyboard 162, a microphone 163, and a pointing device 
161, such as a mouse, trackball or touch pad. Other 
input devices (not shown) may include a joystick, 
game pad, satellite dish, scanner, or the like. 

20 These and other input devices are often connected to 
the processing unit 12 0 through a user input 
interface 160 that is coupled to the system bus, but 
may be connected by other interface and bus 
structures, such as a parallel port, game port or a 

25 universal serial bus (USB) . A monitor 191 or other 
type of display device is also connected to the 
system bus 121 via an interface, such as a video 
interface 190. In addition to the monitor, computers 
may also include other peripheral output devices such 
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as speakers 197 and printer 196, which may be 
connected through an output peripheral interface 190. 

The computer 110 may operate in a networked 
environment using logical connections to one or more 
5 remote computers, such as a remote computer 18 0. The 
remote computer 18 0 may be a personal computer, a 
hand-held device, a server, a router, a network PC, a 
peer device or other common network node , and 
typically includes many or all of the elements 

10 described above relative to the computer 110. The 
logical connections depicted in FIG. 1 include a 
local area network (LAN) 171 and a wide area network 
(WAN) 173, but may also include other networks. Such 
networking environments are commonplace in offices, 

15 enterprise -wide computer networks , intranets and the 
Internet . 

When used in a LAN networking environment, 
the computer 110 is connected to the LAN 171 through 
a network interface or adapter 170. When used in a 

2 0 WAN networking environment, the computer 110 
typically includes a modem 172 or other means for 
establishing communications over the WAN 173, such as 
the Internet. The modem 172, which may be internal 
or external, may be connected to the system bus 121 

25 via the user input interface 160, or other 
appropriate mechanism. In a networked environment, 
program modules depicted relative to the computer 
110, or portions thereof, may be stored in the remote 
memory storage device. By way of example, and not 

30 limitation, FIG. 1 illustrates remote application 
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programs 185 as residing on remote computer 18 0 . It 
will be appreciated that the network connections 
shown are exemplary and other means of establishing a 
communications link between the computers may be 
5 used. 

FIG. 2 is a block diagram of a mobile 
device 200, which is an exemplary computing 
environment. Mobile device 200 includes a 
microprocessor 2 02, memory 2 04 , input/output (I/O) 

10 components 2 06, and a communication interface 208 for 
communicating with remote computers or other mobile 
devices. In one embodiment, the afore-mentioned 
components are coupled for communication with one 
another over a suitable bus 210. 

15 Memory 204 is implemented as non-volatile 

electronic memory such as random access memory (RAM) 
with a battery back-up module (not shown) such that 
information stored in memory 204 is not lost when the 
general power to mobile device 2 00 is shut down. A 

20 portion of memory 204 is preferably allocated as 
addressable memory for program execution, while 
another portion of memory 204 is preferably used for 
storage, such as to simulate storage on a disk drive. 

Memory 2 04 includes an operating system 

25 212, application programs 214 as well as an object 
store 216. During operation, operating system 212 is 
preferably executed by processor 202 from memory 204. 
Operating system 212, in one preferred embodiment, is 
a WINDOWS® CE brand operating system commercially 

3 0 available from Microsoft Corporation. Operating 
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system 212 is preferably designed for mobile devices, 
and implements database features that can be utilized 
by applications 214 through a set of exposed 
application programming interfaces and methods. The 
5 objects in object store 216 are maintained by 
applications 214 and operating system 212, at least 
partially in response to calls to the exposed 
application programming interfaces and methods* 

Communication interface 208 represents 

10 numerous devices and technologies that allow mobile 
device 2 00 to send and receive information. The 
devices include wired and wireless modems, satellite 
receivers and broadcast tuners to name a few. Mobile 
device 200 can also be directly connected to a 

15 computer to exchange data therewith. In such cases, 
communication interface 208 can be an infrared 
transceiver or a serial or parallel communication 
connection, all of which are capable of transmitting 
streaming information . 

20 Input /output components 206 include a 

variety of input devices such as a touch-sensitive 
screen, buttons, rollers, and a microphone as well as 
a variety of output devices including an audio 
generator, a vibrating device, and a display. The 

2 5 devices listed above are by way of example and need 
not all be present on mobile device 200. In 
addition, other input/output devices may be attached 
to or found with mobile device 200 within the scope 
of the present invention. 
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Under the present invention, a system and 
method are provided that reduce noise in pattern 
recognition signals. To do this, the present 

invention identifies a collection of scaling vectors, 
5 S k/ and correction vectors, that can be 

respectively multiplied by and added to a feature 
vector representing a portion of a noisy pattern 
signal to produce a feature vector representing a 
portion of a "clean 11 pattern signal. A method for 

10 identifying the collection of scaling vectors and 
correction vectors is described below with reference 
to the flow diagram of FIG. 3 and the block diagram 
of FIG. 4. A method of applying scaling vectors and 
correction vectors to noisy feature vectors is 

15 described below with reference to the flow diagram of 
FIG. 5 and the block diagram of FIG. 6. 

The method of identifying scaling vectors 
and correction vectors begins in step 300 of FIG. 3, 
where a "clean" channel signal is converted into a 

20 sequence of feature vectors. To do this, a speaker 
4 00 of FIG. 4, speaks into a microphone 4 02, which 
converts the audio waves into electrical signals. 
The electrical signals are then sampled by an analog- 
to-digital converter 404 to generate a sequence of 

25 digital values, which are grouped into frames of 
values by a frame constructor 406. In one 

embodiment, A-to-D converter 4 04 samples the analog 
signal at 16 kHz and 16 bits per sample, thereby 
creating 3 2 kilobytes of speech data per second and 

3 0 frame constructor 406 creates a new frame every 10 
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milliseconds that includes 25 milliseconds worth of 
data . 

Each frame of data provided by frame 
constructor 406 is converted into a feature vector by 
5 a feature extractor 408. Examples of feature 
extraction modules include modules for performing 
Linear Predictive Coding (LPC) , LPC derived cepstrum, 
Perceptive Linear Prediction (PLP) , Auditory model 
feature extraction, and Mel -Frequency Cepstrum 

10 Coefficients (MFCC) feature extraction. Note that 
the invention is not limited to these feature 
extraction modules and that other modules may be used 
within the context of the present invention. 

In step 3 02 of FIG. 3, a noisy channel 

15 signal is converted into feature vectors. Although 
the conversion of step 3 02 is shown as occurring 
after the conversion of step 3 00, any part of the 
conversion may be performed before, during or after 
step 3 00 under the present invention. The conversion 

2 0 of step 3 02 is performed through a process similar to 

that described above for step 300. 

In the embodiment of FIG. 4, this process 
begins when the same speech signal generated by 
speaker 400 is provided to a second microphone 410. 
25 This second microphone also receives an additive 
noise signal from an additive noise source 412. 
Microphone 410 converts the speech and noise signals 
into a single electrical signal, which is sampled by 
an analog-to-digital converter 414. The sampling 

3 0 characteristics for A/D converter 414 are the same as 



those described above for A/D converter 4 04. The 
samples provided by A/D converter 414 are collected 
into frames by a frame constructor 416, which acts in 
a manner similar to frame constructor 406. These 
frames of samples are then converted into feature 
vectors by a feature extractor 418, which uses the 
same feature extraction method as feature extractor 
408. 

In other embodiments, microphone 410, A/D 
converter 414, frame constructor 416 and feature 
extractor 418 are not present. Instead, the additive 
noise is added to a stored version of the speech 
signal at some point within the processing chain 
formed by microphone 402, A/D converter 404, frame 
constructor 406, and feature extractor 408. For 
example, the analog version of the "clean" channel 
signal may be stored after it is created by 
microphone 402. The original "clean" channel signal 
is then applied to A/D converter 404, frame 
constructor 406, and feature extractor 408. When 
that process is complete, an analog noise signal is 
added to the stored "clean" channel signal to form a 
noisy analog channel signal. This noisy signal is 
then applied to A/D converter 404, frame constructor 
406, and feature extractor 408 to form the feature 
vectors for the noisy channel signal. 

In other embodiments, digital samples of 
noise are added to stored digital samples of the 
"clean" channel signal between A/D converter 4 04 and 
frame constructor 406, or frames of digital noise 
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samples are added to stored frames of "clean" channel 
samples after frame constructor 406. In still further 
embodiments, the frames of "clean" channel samples 
are converted into the frequency domain and the 
5 spectral content of additive noise is added to the 
frequency-domain representation of the "clean" 
channel signal. This produces a frequency-domain 
representation of a noisy channel signal that can be 
used for feature extraction. 

10 The feature vectors for the noisy channel 

signal and the "clean" channel signal are provided to 
a noise reduction trainer 420 in FIG. 4. At step 304 
of FIG. 3, noise reduction trainer 420 groups the 
feature vectors for the noisy channel signal into 

15 mixture components. This grouping can be done by 
grouping feature vectors of similar noises together 
using a maximum likelihood training technique or by 
grouping feature vectors that represent a temporal 
section of the speech signal together. Those skilled 

2 0 in the art will recognize that other techniques for 
grouping the feature vectors may be used and that the 
two techniques listed above are only provided as 
examples . 

After the feature vectors of the noisy 

2 5 channel signal have been grouped into mixture 

components, noise reduction trainer 420 generates a 
set of distribution values that are indicative of the 
distribution of the feature vectors within the 
mixture component. This is shown as step 306 in FIG. 

3 0 3. In many embodiments, this involves determining a 
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mean vector and a standard deviation vector for each 
vector component in the feature vectors of each 
mixture component . In an embodiment in which maximum 
likelihood training is used to group the feature 
5 vectors , the means and standard deviations are 
provided as by-products of identifying the groups for 
the mixture components. 

Once the means and standard deviations have 
been determined for each mixture component, the noise 
10 reduction trainer 420 determines a correction vector, 
r k/ and a scaling vector Sk 7 for each mixture 
component, k, at step 308 of FIG. 3. Under one 
embodiment, the vector components of the scaling 
vector and the vector components of the correction 
15 vector for each mixture component are determined 
using a weighted least squares estimation technique. 
Under this technique, the scaling vector components 
are calculated as: 

"r-i Tm 1 Vr-] Tm 
/=0 JL'=o J L>=o _IL'=o 



_/=o J L>=o JL^=o 



20 



EQ.l 



and the correction vector components are calculated 
as : 
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r-i 



1=0 



r-i t-i 



7-1 



/=0 



t=0 



T-\ 



5>(*k,) 



t=0 



EQ.2 



Where Si,k is the i th vector component of a 
scaling vector, S k/ for mixture component k , r i/k is 
5 the i th vector component of a correction vector, r k/ 
for mixture component k, yi /t is the i th vector 
component for the feature vector in the t th frame of 
the noisy channel signal, x irt is the i th vector 
component for the feature vector in the t th frame of 
10 the "clean" channel signal, T is the total number of 
frames in the "clean" and noisy channel signals, and 

is the probability of the k th mixture component 
given the feature vector component for the t th frame 
of the noisy channel signal. 

15 In equations 1 and 2, the p(£|j>,,,) term 

provides a weighting function that indicates the 
relative relationship between the k th mixture 
component and the current frame of the channel 
signals . 

2 0 The ^(fc^,) term can be calculated using 

Bayes 1 theorem as: 



Zp(y<;\ k )p( k ) 



EQ. 



all k 
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Where is the probability of the i th 

vector component in the noisy feature vector given 
the k th mixture component, and p(k) is the probability 
of the k th mixture component . 
5 The probability of the i th vector component 

in the noisy feature vector given the k th mixture 

component, can be determined using a normal 

distribution based on the distribution values 
determined for the k th mixture component in step 3 06 
10 of FIG, 3. In one embodiment, the probability of the 

k th mixture component, p(k) , is simply the inverse of 
the number of mixture components. For example, in an 
embodiment that has 25 6 mixture components, the 
probability of any one mixture component is 1/256. 

15 After a correction vector and a scaling 

vector have been determined for each mixture 
component at step 3 08, the process of training the 
noise reduction system of the present invention is 
complete. The correction vectors, scaling vectors, 

2 0 and distribution values for each mixture component 
are then stored in a noise reduction parameter 
storage 422 of FIG. 4. 

Once the correction vector and scaling 
vector have been determined for each mixture, the 

2 5 vectors may be used in a noise reduction technique of 
the present invention. In particular, the correction 
vectors and scaling vectors may be used to remove 
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noise in a training signal and/or test signal used in 
pattern recognition. 

FIG. 5 provides a flow diagram that 
describes the technique for reducing noise in a 
5 training signal and/or test signal. The process of 
FIG. 5 begins at step 500 where a noisy training 
signal or test signal is converted into a series of 
feature vectors. The noise reduction technique then 
determines which mixture component best matches each 

10 noisy feature vector. This is done by applying the 
noisy feature vector to a distribution of noisy 
channel feature vectors associated with each mixture 
component. In one embodiment, this distribution is a 
collection of normal distributions defined by the 

15 mixture component's mean and standard deviation 
vectors. The mixture component that provides the 
highest probability for the noisy feature vector is 
then selected as the best match for the feature 
vector. This selection is represented in an equation 

20 as : 

k = arg k maxc k N(y;\i k ,I, k ) EQ. 4 

Where k is the best matching mixture 
component, c k is a weight factor for the k th mixture 

component, N(y;\i k9 I, k ) is the value for the individual 
25 noisy feature vector, y, from the normal distribution 
generated for the mean vector, \i k , and the standard 
deviation vector, Z A , of the k th mixture component. 
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In most embodiments, each mixture component is given 

an equal weight factor c k . 

Note that under the present invention, the 
mean vector and standard deviation vector for each 
5 mixture component is determined from noisy channel 
vectors and not "clean" channel vectors as was done 
in the prior art. Because of this, the normal 
distributions based on these means and standard 
deviations are better shaped for finding a best 

10 mixture component for a noisy pattern vector. 

Once the best mixture component for each 
input feature vector has been identified at step 502, 
the corresponding scaling and correction vectors for 
those mixture components are (element by element) 

15 multiplied by and added to the individual feature 
vectors to form "clean" feature vectors. In terms of 
an equation: 

Where x ± is the i th vector component of an 
20 individual "clean" feature vector, y± is the i th 
vector component of an individual noisy feature 
vector from the input signal, and S if}c and r i/k are the 
i th vector component of the scaling and correction 
vectors, respectively, both optimally selected for 
25 the individual noisy feature vector. The operation 
of Equation 5 is repeated for each vector component. 
Thus, Equation 5 can be re-written in vector notation 
as : 

x = S,y + r, EQ. 5 
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where x is the "clean" feature vector, S k is 
the scaling vector, y is the noisy feature vector, 
and r k is the correction vector. 

FIG. 6 provides a block diagram of an 
5 environment in which the noise reduction technique of 
the present invention may be utilized. In 
particular, FIG. 6 shows a speech recognition system 
in which the noise reduction technique of the present 
invention is used to reduce noise in a training 
10 signal used to train an acoustic model and/or to 
reduce noise in a test signal that is applied against 
an acoustic model to identify the linguistic content 
of the test signal. 

In FIG. 6, a speaker 600, either a trainer 
15 or a user, speaks into a microphone 604. Microphone 
604 also receives additive noise from one or more 
noise sources 602. The audio signals detected by 
microphone 604 are converted into electrical signals 
that are provided to analog- to-digital converter 606. 

2 0 Although additive noise 602 is shown entering through 

microphone 604 in the embodiment of FIG. 6, in other 
embodiments, additive noise 6 02 may be added to the 
input speech signal as a digital signal after A-to-D 
converter 606. 

25 A-to-D converter 606 converts the analog 

signal from microphone 604 into a series of digital 
values. In several embodiments, A-to-D converter 606 
samples the analog signal at 16 kHz and 16 bits per 
sample, thereby creating 32 kilobytes of speech data 

3 0 per second. These digital values are provided to a 
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frame constructor 607, which, in one embodiment, 
groups the values into 25 millisecond frames that 
start 10 milliseconds apart. 

The frames of data created by frame 
5 constructor 607 are provided to feature extractor 
610, which extracts a feature from each frame. The 
same feature extraction that was used to train the 
noise reduction parameters (the scaling vectors, 
correction vectors, means, and standard deviations of 

10 the mixture components) is used in feature extractor 
610. As mentioned above, examples of such feature 
extraction modules include modules for performing 
Linear Predictive Coding (LPC) , LPC derived cepstrum, 
Perceptive Linear Prediction (PLP) , Auditory model 

15 feature extraction, and Mel -Frequency Cepstrum 
Coefficients (MFCC) feature extraction. 

The feature extraction module produces a 
stream of feature vectors that are each associated 
with a frame of the speech signal. This stream of 

20 feature vectors is provided to noise reduction module 
610 of the present invention, which uses the noise 
reduction parameters stored in noise reduction 
parameter storage 611 to reduce the noise in the 
input speech signal. In particular, as shown in FIG. 

25 5, noise reduction module 610 selects a single 
mixture component for each input feature vector and 
then multiplies the input feature vector by that 
mixture component's scaling vector and adding that 
mixture component's correction vector to the product 

3 0 to produce a "clean" feature vector. 
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Thus, the output of noise reduction module 
610 is a series of "clean" feature vectors. If the 
input signal is a training signal, this series of 
"clean" feature vectors is provided to a trainer 624, 
5 which uses the "clean" feature vectors and a training 
text 626 to train an acoustic model 618. Techniques 
for training such models are known in the art and a 
description of them is not required for an 
understanding of the present invention. 
10 If the input signal is a test signal, the 

"clean" feature vectors are provided to a decoder 
612, which identifies a most likely sequence of words 
based on the stream of feature vectors, a lexicon 
614, a language model 616, and the acoustic model 
15 618. The particular method used for decoding is not 
important to the present invention and any of several 
known methods for decoding may be used. 

The most probable sequence of hypothesis 
words is provided to a confidence measure module 620. 
2 0 Confidence measure module 62 0 identifies which words 
are most likely to have been improperly identified by 
the speech recognizer, based in part on a secondary 
acoustic model (not shown) . Confidence measure module 
62 0 then provides the sequence of hypothesis words to 
25 an output module 622 along with identifiers 
indicating which words may have been improperly 
identified. Those skilled in the art will recognize 
that confidence measure module 620 is not necessary 
for the practice of the present invention. 



Although FIG. 6 depicts a speech 
recognition system, the present invention may be used 
in any pattern recognition system and is not limited 
to speech. 

Although the present invention has been 
described with reference to particular embodiments, 
workers skilled in the art will recognize that 
changes may be made in form and detail without 
departing from the spirit and scope of the invention. 



WHAT IS CLAIMED IS: 

1. A method of noise reduction for reducing 
noise in a noisy input signal, the method comprising: 

fitting a function applied to a sequence 
of noisy channel feature vectors that 
represent a noisy channel signal to a 
sequence of clean channel feature 
vectors that represent a clean channel 
signal to determine at least one 
correction vector and at least one 
scaling vector; 

multiplying the scaling vector by each 
noisy input feature vector of a 
sequence of noisy input feature 
vectors that represent a noisy input 
signal to produce a sequence of scaled 
feature vectors; and 

adding a correction vector to each scaled 
feature vector to form a sequence of 
clean input feature vectors, the 
sequence of clean input feature 
vectors representing a clean input 
signal having less noise than the 
noisy input signal. 

2 . The method of claim 1 wherein determining 
at least one correction vector and at least one 
scaling vector comprises generating a set of 
correction and scaling vectors, each correction 
vector and scaling vector corresponding to a separate 
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mixture component of the sequence of noisy channel 
feature vectors. 

3 . The method of claim 2 wherein determining a 
correction vector comprises: 

grouping the noisy channel feature vectors 
into at least one mixture component; 

determining a distribution value that is 
indicative of the distribution of the 
noisy channel feature vectors in at 
least one mixture component; and 

using the distribution value for a mixture 
component to determine the correction 
vector and the scaling vector for that 
mixture component . 

4 . The method of claim 3 wherein using the 
distribution value to determine a correction vector 
and a scaling vector for a mixture component 
comprises : 

determining, for each noisy channel feature 
vector, at least one conditional 
mixture probability, the conditional 
mixture probability representing the 
probability of the mixture component 
given the noisy channel feature 
vector, the conditional mixture 
probability based in part on a 
distribution value for the mixture 
component ; and 
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applying the conditional mixture 

probability in a linear least squares 
calculation. 

5. The method of claim 4 wherein determining a 
conditional mixture probability comprises: 

determining a conditional feature vector 
probability that represents the 
probability of a noisy channel feature 
vector given the mixture component, 
the probability based on the 
distribution value for the mixture; 

multiplying the conditional feature vector 
probability by the unconditional 
probability of the mixture component 
to produce a probability product; and 

dividing the probability product by the sum 
of the probability products generated 
for all mixture components for the 
noisy channel feature vector. 

6 . The method of claim 5 wherein determining a 
conditional feature vector probability comprises 
determining the probability from a normal 
distribution formed from the distribution value for a 
mixture component . 

7. The method of claim 6 wherein determining a 
distribution value comprises determining a mean 
vector and determining a standard deviation vector. 
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8. The method of claim 2 wherein multiplying 
the scaling vector by each noisy input feature vector 
comprises : 

identifying a mixture component for each 
noisy input feature vector; and 

multiplying each noisy input feature vector 
by a scaling vector associated with 
the mixture component . 

9. The method of claim 8 wherein adding a 
correction vector comprises adding a correction 
vector associated with the mixture component to each 
scaled feature vector. 

10. The method of claim 9 wherein identifying a 
mixture component comprises identifying the most 
likely mixture component for each noisy input feature 
vector. 

11. The method of claim 10 wherein identifying 
the most likely mixture component comprises: 

grouping the noisy channel feature vectors 
into at least one mixture component; 

determining a distribution value that is 
indicative of the distribution of the 
noisy channel feature vectors in at 
least one mixture component ; 

for each mixture component, determining a 
probability of the noisy input feature 
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vector given the mixture component 
based on a normal distribution formed 
from the distribution value for that 
mixture component; and 
selecting the mixture component that 
provides the highest probability as 
the most likely mixture component. 

12. A method of reducing noise in a noisy 
signal, the method comprising: 

identifying a mixture component for a noisy 
feature vector representing a part of 
the noisy signal; 

retrieving a correction vector and a 
scaling vector associated with the 
identified mixture component; 

multiplying the noisy feature vector by the 
scaling vector to form a scaled 
feature vector; and 

adding the correction vector to the scaled 
feature vector to form a clean feature 
vector representing a part of a clean 
signal . 

13 . The method of claim 12 wherein identifying 
a mixture component comprises identifying a most 
likely mixture component for a noisy feature vector. 

14 . The method of claim 13 wherein identifying 
a most likely mixture component comprises: 
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for each mixture component, determining a 
probability of the noisy feature 
vector given the mixture component; 
and 

selecting the mixture component that 
provides the highest probability as 
the most likely mixture component. 

15. The method of claim 14 wherein determining 
a probability comprises determining a probability 
based on a distribution of noisy channel feature 
vectors that are assigned to the mixture component . 

16. The method of claim 15 wherein determining 
a probability based on a distribution comprises 
determining a probability based on a mean and a 
standard deviation of the distribution. 

17. The method of claim 12 wherein retrieving a 
correction vector and a scaling vector comprises 
retrieving a correction vector and a scaling vector 
formed through fitting a function evaluated on a 
sequence of noisy channel feature vectors to a 
sequence of clean channel feature vectors. 

18. The method of claim 17 wherein fitting the 
function comprises performing a linear least squares 
calculation. 
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19. The method of claim 18 wherein performing a 
linear least squares calculation comprises utilizing 
a weight value in the linear least squares 
calculation, the weight value providing an indication 
of association between a noisy channel feature vector 
and a mixture component . 

20. The method of claim 19 wherein utilizing a 
weight value comprises: 

determining a conditional probability of a 
mixture component given a noisy 
channel feature vector; and 

using the conditional probability as the 
weight value. 

21. The method of claim 20 wherein determining 
a conditional probability comprises: 

for each mixture component, determining a 
probability of the mixture component 
and determining a feature probability 
that represents the probability of the 
noisy channel feature vector given the 
mixture component ; 

for each mixture component, multiplying the 
probability of the mixture component 
by the respective feature probability 
for the mixture component to provide a 
respective probability product; 

summing the probability products of the 
noisy feature vector for all mixture 



components to produce a probability 
sum; 

multiplying the probability of the mixture 
component associated with the 
correction vector and the scaling 
vector by the probability of the noisy 
feature vector given the mixture 
component associated with the 
correction vector and the scaling 
vector to produce a second probability 
product; and 

dividing the second probability product by 
the probability sum. 

22. A computer-readable medium comprising 

computer-executable instructions for reducing noise 
in a signal through steps comprising: 

using a representation value that 
represents a portion of the signal to 
identifying an optimal mixture 
component for that portions- 
selecting a correction value and a scaling 
value associated with the identified 
optimal mixture component; and 
multiplying the scaling value by the 
representation value to form a 
product ; and 
adding the product to the correction value 
to form a noise-reduced value that 



represents a portion of a noise- 
reduced signal. 

23. The computer-readable medium of claim 22 
wherein the step of using a representation value to 
identify an optimal mixture component comprises: 

for each mixture component, applying the 
representation value to a distribution 
of representation values associated 
with the mixture component to generate 
a likelihood of the representation 
value given the mixture component; and 

selecting the mixture component that 
generates the greatest likelihood as 
the optimal mixture component. 

24. A method of generating correction values 
for removing noise from an input signal, the method 
comprising : 

accessing a set of noisy channel vectors 
representing a noisy channel signal; 

accessing a set of clean channel vectors 
representing a clean channel signal; 

grouping the noisy channel vectors into a 
plurality of mixture components; and 

determining a correction value for each 
mixture component based on the set of 
noisy channel vectors and the set of 
clean channel vectors. 
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25. The method of claim 24 wherein determining 
a correction value comprises fitting a function based 
on the noisy channel vectors to the clean channel 
vectors . 

26. The method of claim 25 wherein fitting a 
function comprises performing a linear least squares 
calculation. 

27. The method of claim 26 wherein performing a 
linear least squares calculation comprises: 

determining a distribution parameter for 
each mixture component, the 

distribution parameter describing the 
distribution of noisy channel vectors 
associated with the respective mixture 
component ; 

using the distribution parameter to form a 

weight value; and 
utilizing the weight value in the linear 

least squares calculation. 

28. The method of claim 27 wherein using the 
distribution parameter to form a weight value 
comprises using the distribution parameter to 
determine a probability of a mixture component given 
a noisy channel vector. 



29. The method of claim 24 wherein determining 
a correction value comprises determining an additive 
correction value and a scaling correction value. 

30. The method of claim 24 wherein grouping the 
noisy channel vectors comprises determining a 
distribution parameter for each mixture component, 
the distribution parameter describing the 
distribution of noisy channel vectors associated with 
the respective mixture component and wherein 
determining a correction value comprises determining 
a correction value based in part on the distribution 
parameters . 

31. The method of claim 24 further comprising 
using the correction values to remove noise from an 
input signal through a process comprising: 

converting the input signal into input 
vectors; 

finding a best suited mixture component for 
each input vector; and 

for each input vector, applying to the 
input vector a correction value 
associated with the mixture component 
best suited for the input vector. 
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METHOD OF NOISE REDUCTION USING 
CORRECTION AND SCALING VECTORS WITH 
PARTITIONING OF THE ACOUSTIC SPACE IN 
THE DOMAIN OF NOISY SPEECH 

ABSTRACT OF THE DISCLOSURE 
A method and apparatus are provided for 
reducing noise in a training signal and/or test 
signal . The noise reduction technique uses a stereo 
signal formed of two channel signals, each channel 
containing the same pattern signal. One of the 
channel signals is "clean" and the other includes 
additive noise. Using feature vectors from these 
channel signals, a collection of noise correction and 
scaling vectors is determined. When a feature vector 
of a noisy pattern signal is later received, it is 
multiplied by the best scaling vector for that 
feature vector and the best correction vector is 
added to the product to produce a noise reduced 
feature vector. Under one embodiment, the best 
scaling and correction vectors are identified by 
choosing an optimal mixture component for the noisy 
feature vector . The optimal mixture component being 
selected based on a distribution of noisy channel 
feature vectors associated with each mixture 
component . 
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